Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 31
Filter
1.
5th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2023 ; : 444-447, 2023.
Article in English | Scopus | ID: covidwho-2306891

ABSTRACT

Sentiment analysis has a critical role to reveal an opinion in a text-based form. Therefore, we exploit this analysis to discover the sentiment polarity of Taiwan Social Distancing mobile application. This paper proposes a semi-supervised scheme for annotating this mobile application's reviews. The semi-supervised scheme utilized a combination of numeric rating and lexicon-based sentiment. In addition, we also perform the sentiment analysis on an aspect-based level. Based on the experiment, we decide to select three aspects to be analyzed. This paper also evaluates the proposed scheme by implementing bidirectional encoder representations from transformers (BERT) and multilayer perceptron (MLP) as the classification model using the sentiment label of the proposed scheme. The result shows that the annotation of the proposed scheme outperforms the data annotation using counterpart models. © 2023 IEEE.

2.
JAMIA Open ; 6(2): ooad023, 2023 Jul.
Article in English | MEDLINE | ID: covidwho-2306120

ABSTRACT

Objective: To develop and apply a natural language processing (NLP)-based approach to analyze public sentiments on social media and their geographic pattern in the United States toward coronavirus disease 2019 (COVID-19) vaccination. We also aim to provide insights to facilitate the understanding of the public attitudes and concerns regarding COVID-19 vaccination. Methods: We collected Tweet posts by the residents in the United States after the dissemination of the COVID-19 vaccine. We performed sentiment analysis based on the Bidirectional Encoder Representations from Transformers (BERT) and qualitative content analysis. Time series models were leveraged to describe sentiment trends. Key topics were analyzed longitudinally and geospatially. Results: A total of 3 198 686 Tweets related to COVID-19 vaccination were extracted from January 2021 to February 2022. 2 358 783 Tweets were identified to contain clear opinions, among which 824 755 (35.0%) expressed negative opinions towards vaccination while 1 534 028 (65.0%) demonstrated positive opinions. The accuracy of the BERT model was 79.67%. The key hashtag-based topics include Pfizer, breaking, wearamask, and smartnews. The sentiment towards vaccination across the states showed manifest variability. Key barriers to vaccination include mistrust, hesitancy, safety concern, misinformation, and inequity. Conclusion: We found that opinions toward the COVID-19 vaccination varied across different places and over time. This study demonstrates the potential of an analytical pipeline, which integrates NLP-enabled modeling, time series, and geospatial analyses of social media data. Such analyses could enable real-time assessment, at scale, of public confidence and trust in COVID-19 vaccination, help address the concerns of vaccine skeptics, and provide support for developing tailored policies and communication strategies to maximize uptake.

3.
Information & Management ; 59(2):1-18, 2022.
Article in English | APA PsycInfo | ID: covidwho-2254327

ABSTRACT

This study investigates customer satisfaction through aspect-level sentiment analysis and visual analytics. We collected and examined the flight reviews on TripAdvisor from January 2016 to August 2020 to gauge the impact of COVID-19 on passenger travel sentiment in several aspects. Till now, information systems, management, and tourism research have paid little attention to the use of deep learning and word embedding techniques, such as bidirectional encoder representations from transformers, especially for aspect-level sentiment analysis. This paper aims to identify perceived aspect-based sentiments and predict unrated sentiments for various categories to address this research gap. Ultimately, this study complements existing sentiment analysis methods and extends the use of data-driven and visual analytics approaches to better understand customer satisfaction in the airline industry and within the context of the COVID-19. Our proposed method outperforms baseline comparisons and therefore contributes to the theoretical and managerial literature. (PsycInfo Database Record (c) 2023 APA, all rights reserved)

4.
Data ; 8(3), 2023.
Article in English | Scopus | ID: covidwho-2288144

ABSTRACT

To address the COVID-19 situation in Indonesia, the Indonesian government has adopted a number of policies. One of them is a vacation-related policy. Government measures with regard to this vacation policy have produced a wide range of viewpoints in society, which have been extensively shared on social media, including YouTube. However, there has not been any computerized system developed to date that can assess people's social media reactions. Therefore, this paper provides a sentiment analysis application to this government policy by employing a bidirectional encoder representation from transformers (BERT) approach. The study method began with data collecting, data labeling, data preprocessing, BERT model training, and model evaluation. This study created a new dataset for this topic. The data were collected from the comments section of YouTube, and were categorized into three categories: positive, neutral, and negative. This research yielded an F-score of 84.33%. Another contribution from this study regards the methodology for processing sentiment analysis in Indonesian. In addition, the model was created as an application using the Python programming language and the Flask framework. The government can learn the extent to which the public accepts the policies that have been implemented by utilizing this research. © 2023 by the authors.

5.
8th China Conference on China Health Information Processing, CHIP 2022 ; 1772 CCIS:82-94, 2023.
Article in English | Scopus | ID: covidwho-2286086

ABSTRACT

For the purpose of capturing the semantic information accurately and clarifying the user's questioning intention, this paper proposes a novel, ensemble deep architecture BERT-MSBiLSTM-Attentions (BMA) which uses the Bidirectional Encoder Representations from Transformers (BERT), Multi-layer Siamese Bi-directional Long Short Term Memory (MSBiLSTM) and dual attention mechanism (Attentions) in order to solve the current question semantic similarity matching problem in medical automatic question answering system. In the preprocessing part, we first obtain token-level and sentence-level embedding vectors that contain rich semantic representations of complete sentences. The fusion of more accurate and adequate semantic features obtained through Siamese recurrent network and dual attention network can effectively eliminate the effect of poor matching results due to the presence of certain non-canonical texts or the diversity of their expression ambiguities. To evaluate our model, we splice the dataset of Ping An Healthkonnect disease QA transfer learning competition and "public AI star” challenge - COVID-19 similar sentence judgment competition. Experimental results with CC19 dataset show that BMA network achieves significant performance improvements compared to existing methods. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

6.
Sustainability (Switzerland) ; 15(5), 2023.
Article in English | Scopus | ID: covidwho-2249257

ABSTRACT

Global natural and manmade events are exposing the fragility of the tourism industry and its impact on the global economy. Prior to the COVID-19 pandemic, tourism contributed 10.3% to the global GDP and employed 333 million people but saw a significant decline due to the pandemic. Sustainable and smart tourism requires collaboration from all stakeholders and a comprehensive understanding of global and local issues to drive responsible and innovative growth in the sector. This paper presents an approach for leveraging big data and deep learning to discover holistic, multi-perspective (e.g., local, cultural, national, and international), and objective information on a subject. Specifically, we develop a machine learning pipeline to extract parameters from the academic literature and public opinions on Twitter, providing a unique and comprehensive view of the industry from both academic and public perspectives. The academic-view dataset was created from the Scopus database and contains 156,759 research articles from 2000 to 2022, which were modelled to identify 33 distinct parameters in 4 categories: Tourism Types, Planning, Challenges, and Media and Technologies. A Twitter dataset of 485,813 tweets was collected over 18 months from March 2021 to August 2022 to showcase the public perception of tourism in Saudi Arabia, which was modelled to reveal 13 parameters categorized into two broader sets: Tourist Attractions and Tourism Services. The paper also presents a comprehensive knowledge structure and literature review of the tourism sector based on over 250 research articles. Discovering system parameters are required to embed autonomous capabilities in systems and for decision-making and problem-solving during system design and operations. The work presented in this paper has significant theoretical and practical implications in that it improves AI-based information discovery by extending the use of scientific literature, Twitter, and other sources for autonomous, holistic, dynamic optimizations of systems, promoting novel research in the tourism sector and contributing to the development of smart and sustainable societies. © 2023 by the authors.

7.
ICIC Express Letters ; 17(2):171-179, 2023.
Article in English | Scopus | ID: covidwho-2245508

ABSTRACT

The COVID-19 pandemic undoubtedly has affected people's lifestyles and stock investment activities. The government's policies to deal with the pandemic have an impact on increasing the number of investors in the stock market. Apart from profits, there are also risks associated with investing in stocks. To reduce the risk required analysis for stock price predictions. The data often used are stock data, commodity prices, and social media. The application of deep learning and natural language processing can help investors to process data. This paper proposes Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM) for technical analysis predicting stock prices using stock and commodity price data and urges BERT for sentiment analysis using social media data. The CNN-LSTM method has the best performance compared to the other four methods. The results showed that the performance of this method was the best, Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) were the smallest, and R Square (R2) was the largest. The BERT method has the best classification performance using 5-epochs, Weight Macro Avg, Weighted Avg, Accuracy, and the highest F1-Score. CNN-LSTM and BERT are more appropriate to predict stock prices and give investors suggestions to make stock investment decisions based on technical analysis and sentiment analysis. © 2023 ICIC International. All rights reserved.

8.
Ieee Transactions on Computational Social Systems ; 2022.
Article in English | Web of Science | ID: covidwho-2213374

ABSTRACT

Since there are so many users on social media, who are not qualified to report news, fake news has become a major problem in recent years. Therefore, it is crucial to identify and restrict the dissemination of false information. Numerous deep learning models that make use of natural language processing have yielded excellent results in the detection of fake news. bidirectional encoder representations from transformers (BERT), based on transfer learning, is one of the most advanced models. In this work, the researchers have compared the earlier studies that employed baseline models versus the research articles where the researchers used a pretrained model BERT for the detection of fake news. The literature analysis revealed that utilizing pretrained algorithms is more effective at identifying fake news because it takes less time to train them and yields better results. Based on the results noted in this article, the researchers have advised the utilization of pretrained models that have already been taught to take advantage of transfer learning, which shortens training time and enables the use of large datasets, as well as a reputable model that performs well in terms of precision, recall, as well as the minimum number of false positive and false negative outputs. As a result, the researchers created an improved BERT model, while considering fine-tuning it to meet the demands of the fake news identification assignment. To obtain the most accurate representation of the input text, the final layer of this model is also unfrozen and trained on news texts. The dataset used in the study included 23 502 articles of fake news and 21 417 items of actual news. This dataset was downloaded from the Kaggle website. The results of this study demonstrated that the proposed model showed a better performance compared with other models, and achieved 99.96% and 99.96% in terms of accuracy and F1 score, respectively.

9.
4th International Conference on Inventive Research in Computing Applications, ICIRCA 2022 ; : 1214-1219, 2022.
Article in English | Scopus | ID: covidwho-2213283

ABSTRACT

The Coronavirus Disease 2019 (COVID-19) is one of the worst outbreaks on record in the world. The assessment of public sentiment during the outbreak helps public health officials make the appropriate decision. Twitter is a famous social media platform where people are sharing their views and is utilized to identify public sentiments. The Twitter posts regarding COVID-19 (December 2020 to April-2021) are randomly selected and categorized based on four types of sentiments (Awareness, Irrelevant, Report, and Treatment). This paper proposes a Deep Convolutional Neural Network (DCNN) based on Bidirectional Encoder Representations from Transformers (BERT), which is known as DCNN-BERT for deploying three layers of CNN with BERT in order to fine-tune the appropriate input and output layers to provide cutting-edge models for performing a variety of text analysis and Natural Language Processing (NLP) tasks. The proposed model is compared with state-of-the-art Machine Learning (ML) models by analyzing the well-known performance metrics like Precision, Recall, F1-S core, and Accuracy. The empirical results indicate that the proposed model provides 0.87, 0.88, 0.87, and 0.85 as accuracy, precision, recall, and F1-score respectively. © 2022 IEEE.

10.
1st International Conference on Ambient Intelligence in Health Care, ICAIHC 2021 ; 317:317-324, 2023.
Article in English | Scopus | ID: covidwho-2173922

ABSTRACT

Covid-19 pandemic has affected the lives of people across the globe. People belonging to all the sectors of the society have faced a lot of challenges. Strict measures like lockdown and social distancing have been imposed several times by governments throughout the world. Universities had to incorporate the online method of teaching instead of the regular offline classes to implement social distancing. Online classes were beneficial to most of the students;at the same time, there were many difficulties faced by the students due to lack of facilities to attend classes online. Students faced a lot of challenges, and a sense of anxiety was prevalent during the uncertain times of the pandemic. This research article analyzes the stress among students considering the tweets across the globe related to students stress. The algorithms considered for classification of tweets as positive or negative are support vector machine (SVM), bidirectional encoder representation from transformers (BERT), and long short-term memory (LSTM). The accuracy of the abovementioned algorithms is compared. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

11.
18th International Conference on Advanced Data Mining and Applications, ADMA 2022 ; 13725 LNAI:106-117, 2022.
Article in English | Scopus | ID: covidwho-2173834

ABSTRACT

The coronavirus pandemic has caused a worldwide crisis and a drastic change in day-to-day life activities. Worldwide, people use social media platforms to share and discuss their opinions about the situation. Twitter is one such platform for public conversation around the coronavirus pandemic, the spread of disease, vaccination, non-pharmaceutical interventions, and many other discussions. In this study, we use Twitter social medial data for sentiment analysis. The tweets are collected based on covid-19 related hashtags. This work presents a deep learning-based framework for sentiment analysis using DistilBERT, a distilled version of Bidirectional Encoder Representation from Transformers (BERT), Convolutional Neural Network (CNN), and Long Short Term Memory (LSTM). The results show that transformer-based pre-processing and fine-tuning yield better performance results. The DistilBERT model yields the highest accuracy of 91.46% compared to the CNN and LSTM models. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

12.
2022 International Conference on Edge Computing and Applications, ICECAA 2022 ; : 1559-1564, 2022.
Article in English | Scopus | ID: covidwho-2152470

ABSTRACT

Worldwide, the (COVID-19) pandemic had also affected people's daily routines. In general also during lockdown periods, people around the world use social media to express their thoughts and feelings about the epidemic which has interrupted their daily lives. There has been a huge spike in tweets about coronavirus on Twitter in a short period of time, including both positive and negative messages. As a result of the wide range of content in the tweets, the researchers have turned to sentiment analysis in order to gauge how the general public feels about COVID-19. According to the findings of this study, the best way to examine COVID-19 is to look athow people use Twitter to share theirthoughts and opinions. Sentiment categorization can be accomplished by utilising a variety of feature sets as well as classifiers in combination with the suggested approach. Tweets collected from people with COVID-19 perceptions can be used to better understand and manage the epidemic. Positive, negative, as well as neutral emotion classifications are being usedto classify tweets. In this study, Tweets containing specific information about the Coronavirus epidemic are used as sentiment analysis packages. Bidirectional Encoder Representations from Transformers (BERT) are used to identify sentiment categories, whereas the TF-IDF (term frequency-inverse document frequency) prototype is used to summarise the topics of postings. Trend analysis and qualitative methods are being used to identify negative sentiment traits. In general, when it comes to sentiment classification, the fine-tuned BERT is very accurate. In addition, the COVID-19related post features of TF-IDF themes are accurately conveyed. Coronavirus tweet sentiments are analysedusing a BERT and TF-IDF hybrid classifier. Single-sentence classification is transformedinto pair-sentence classification, which solves BERT's performance issue in text classification problems. Our evaluation measures (accuracy= 0.70;precision= 0.67;recall= 0.64;and F1-score= 0.65) are used to evaluate the effectiveness of the classifier. © 2022 IEEE.

13.
Int J Med Inform ; 170: 104956, 2023 02.
Article in English | MEDLINE | ID: covidwho-2149862

ABSTRACT

BACKGROUND: Owing to the prevalence of the coronavirus disease (COVID-19), coping with clinical issues at the individual level has become important to the healthcare system. Accordingly, precise initiation of treatment after a hospital visit is required for expedited processes and effective diagnoses of outpatients. To achieve this, artificial intelligence in medical natural language processing (NLP), such as a healthcare chatbot or a clinical decision support system, can be suitable tools for an advanced clinical system. Furthermore, support for decisions on the medical specialty from the initial visit can be helpful. MATERIALS AND METHODS: In this study, we propose a medical specialty prediction model from patient-side medical question text based on pre-trained bidirectional encoder representations from transformers (BERT). The dataset comprised pairs of medical question texts and labeled specialties scraped from a website for the medical question-and-answer service. The model was fine-tuned for predicting the required medical specialty labels among 27 labels from medical question texts. To demonstrate the feasibility, we conducted experiments on a real-world dataset and elaborately evaluated the predictive performance compared with four deep learning NLP models through cross-validation and test set evaluation. RESULTS: The proposed model showed improved performance compared with competitive models in terms of overall specialties. In addition, we demonstrate the usefulness of the proposed model by performing case studies for visualization applications. CONCLUSION: The proposed model can benefit hospital patient management and reasonable recommendations for specialties for patients.


Subject(s)
COVID-19 , Medicine , Humans , Artificial Intelligence , Adaptation, Psychological , Cognition , Natural Language Processing
14.
IAES International Journal of Artificial Intelligence ; 12(1):488-495, 2023.
Article in English | Scopus | ID: covidwho-2100401

ABSTRACT

Social media impacts society whether these impacts are positive or negative, or even both. It has become a key component of our lives and a vital news resource. The crisis of COVID-19 has impacted the lives of all people. The spread of misinformation causes confusion among individuals. So automated methods are vital to detect the wrong arguments to prevent misinformation spread. The COVID-19 news can be classified into two categories: false or real. This paper provides an automated misinformation checking system for the COVID-19 news. Five machine learning algorithms and deep learning models are evaluated. The proposed system uses the bidirectional encoder representations from transformers (BERT) with deep learning models. detecting fake news using BERT is a fine-tuning. BERT achieved accuracy (98.83%) as a pre-trained and a classifier on the COVID-19 dataset. Better results are obtained using BERT with deep learning models, which achieved accuracy (99.1%). The results achieved improvements in the area of fake news detection. Another contribution of the proposed system allows users to detect claims' credibility. It finds the most related real news from experts to the fake claims and answers any question about COVID-19 using the universal-sentence-encoder model. © 2023, Institute of Advanced Engineering and Science. All rights reserved.

15.
International Journal of Advanced Computer Science and Applications ; 13(9):667-674, 2022.
Article in English | Scopus | ID: covidwho-2081044

ABSTRACT

Due to the Covid-19 pandemic which started in the year 2020, many nations had imposed lockdown to curb the spread of this virus. People have been sharing their experiences and perspectives on social media on the lockdown situation. This has given rise to increased number of tweets or posts on social media. Multi-class text classification, a method of classifying a text into one of the pre-defined categories, is one of the effective ways to analyze such data that is implemented in this paper. A Covid-19 dataset is used in this work consisting of fifteen pre-defined categories. This paper presents a multi-layered hybrid model, LSTM followed by GRU, to integrate the benefits of both the techniques. The advantages of word embeddings techniques like GloVe and BERT have been implemented and found that, for three epochs, the transfer learning based pre-trained BERT-hybrid model performs one percent better than GloVe-hybrid model but the state-of-the-art, fine-tuned BERT-base model outperforms the BERT-hybrid model by three percent, in terms of validation loss. It is expected that, over a larger number of epochs, the hybrid model might outperform the fine-tuned model. © 2022,International Journal of Advanced Computer Science and Applications. All Rights Reserved.

16.
Ieee Access ; 10:104156-104168, 2022.
Article in English | Web of Science | ID: covidwho-2070271

ABSTRACT

The named entity recognition based on the epidemiological investigation of information on COVID-19 can help analyze the source and route of transmission of the epidemic to control the spread of the epidemic better. Therefore, this paper proposes a Chinese named entity recognition model BERT-BiLSTM-IDCNN-ELU-CRF (BBIEC) based on the epidemiological investigation of information on COVID-19 of the BERT pre-training model. The model first processes the unlabeled epidemiological investigation of information on COVID-19 into the character-level corpus and annotates it with artificial entities according to the BIOES character-level labeling system and then uses the BERT pre-training model to obtain the word vector with position information;then, through the bidirectional long-short term memory neural network (BiLSTM) and the improved iterated dilated convolutional neural network (IDCNN) extract global context and local features from the generated word vectors and concatenate them serially;output all possible label sequences to the conditional random field (CRF);finally pass the condition random The airport decodes and generates the entity tag sequence. The experimental results show that the model is better than other traditional models in recognizing the entity of the epidemiological investigation of information on COVID-19.

17.
Applied Sciences ; 12(17):8398, 2022.
Article in English | ProQuest Central | ID: covidwho-2023104

ABSTRACT

Fake news detection techniques are a topic of interest due to the vast abundance of fake news data accessible via social media. The present fake news detection system performs satisfactorily on well-balanced data. However, when the dataset is biased, these models perform poorly. Additionally, manual labeling of fake news data is time-consuming, though we have enough fake news traversing the internet. Thus, we introduce a text augmentation technique with a Bidirectional Encoder Representation of Transformers (BERT) language model to generate an augmented dataset composed of synthetic fake data. The proposed approach overcomes the issue of minority class and performs the classification with the AugFake-BERT model (trained with an augmented dataset). The proposed strategy is evaluated with twelve different state-of-the-art models. The proposed model outperforms the existing models with an accuracy of 92.45%. Moreover, accuracy, precision, recall, and f1-score performance metrics are utilized to evaluate the proposed strategy and demonstrate that a balanced dataset significantly affects classification performance.

18.
2022 IEEE International Conference on Electro Information Technology, eIT 2022 ; 2022-May:417-422, 2022.
Article in English | Scopus | ID: covidwho-1961372

ABSTRACT

The growth of social data on the internet has accelerated during the last two decades. As a result, researchers can access data and information for various academic and commercial purposes. The novel coronavirus disease (COVID-19) is a current pandemic that has sparked widespread concern worldwide. Spreading misleading information on social media platforms like Twitter, on the other hand, is exacerbating the disease's concern. This research aims to examine tweets and develop a model that can detect public sentiment from social media posts;consequently, necessary precautions can be taken to preserve adequate validity of information for the general public. We believe that various social media platforms have a significant impact on creating public awareness about the disease's importance and encouraging preventive measures among community members. For this study, we applied the Bidirectional Encoder Representations from Transformers (BERT) model, a new deep-learning technique for text analysis and performance with exceptional multi-class accuracy. We also compared it with six shallow machine learning models. © 2022 IEEE.

19.
New Gener Comput ; 40(4): 1165-1202, 2022.
Article in English | MEDLINE | ID: covidwho-1930400

ABSTRACT

Social media materialized as an influential platform that allows people to share their views on global and local issues. Sentiment analysis can handle these massive amounts of unstructured reviews and convert them into meaningful opinions. Undoubtedly, COVID-19 originated as the enormous challenge across the world that physically and financially bruted humankind. Meanwhile, farmers' protests shook up the world against three pieces of legislation passed by the Indian government. Hence, an artificial intelligence-based sentiment model is needed for suggesting the right direction toward outbreaks. Although Deep Neural Network (DNN) gained popularity in sentiment analysis applications, these still have a limitation of sequential training, high-dimension feature space, and equal feature importance distribution. In addition, inaccurate polarity scoring and utility-based topic modeling are other challenging aspects of sentiment analysis. It motivates us to propose a Knowledge-Enriched Attention-based Hybrid Transformer (KEAHT) model by enriching the explicit knowledge of Latent Dirichlet Allocation (LDA) topic modeling and lexicalized domain ontology. A pre-trained Bidirectional Encoder Representation from Transformer (BERT) is employed to train within a minimum training corpus. It provides the facility of attention mechanism and can solve complex text problems accurately. A comparative study with existing baselines and recent hybrid models affirms the credibility of the proposed KEAHT in the field of Natural Language Processing (NLP). This model emphasizes artificial intelligence's role in handling the situation of the global pandemic and democratic dispute in a country. Furthermore, two benchmark datasets, namely "COVID-19-Vaccine-Labelled-Tweets" and "Indian-Farmer-Protest-Labelled-Tweets", are also constructed to accommodate future researchers for outlining the essential facts associated with the outbreaks.

20.
JMIR Form Res ; 6(6): e34834, 2022 Jun 29.
Article in English | MEDLINE | ID: covidwho-1910880

ABSTRACT

BACKGROUND: In recent years, social media has become a major channel for health-related information in Saudi Arabia. Prior health informatics studies have suggested that a large proportion of health-related posts on social media are inaccurate. Given the subject matter and the scale of dissemination of such information, it is important to be able to automatically discriminate between accurate and inaccurate health-related posts in Arabic. OBJECTIVE: The first aim of this study is to generate a data set of generic health-related tweets in Arabic, labeled as either accurate or inaccurate health information. The second aim is to leverage this data set to train a state-of-the-art deep learning model for detecting the accuracy of health-related tweets in Arabic. In particular, this study aims to train and compare the performance of multiple deep learning models that use pretrained word embeddings and transformer language models. METHODS: We used 900 health-related tweets from a previously published data set extracted between July 15, 2019, and August 31, 2019. Furthermore, we applied a pretrained model to extract an additional 900 health-related tweets from a second data set collected specifically for this study between March 1, 2019, and April 15, 2019. The 1800 tweets were labeled by 2 physicians as accurate, inaccurate, or unsure. The physicians agreed on 43.3% (779/1800) of tweets, which were thus labeled as accurate or inaccurate. A total of 9 variations of the pretrained transformer language models were then trained and validated on 79.9% (623/779 tweets) of the data set and tested on 20% (156/779 tweets) of the data set. For comparison, we also trained a bidirectional long short-term memory model with 7 different pretrained word embeddings as the input layer on the same data set. The models were compared in terms of their accuracy, precision, recall, F1 score, and macroaverage of the F1 score. RESULTS: We constructed a data set of labeled tweets, 38% (296/779) of which were labeled as inaccurate health information, and 62% (483/779) of which were labeled as accurate health information. We suggest that this was highly efficacious as we did not include any tweets in which the physician annotators were unsure or in disagreement. Among the investigated deep learning models, the Transformer-based Model for Arabic Language Understanding version 0.2 (AraBERTv0.2)-large model was the most accurate, with an F1 score of 87%, followed by AraBERT version 2-large and AraBERTv0.2-base. CONCLUSIONS: Our results indicate that the pretrained language model AraBERTv0.2 is the best model for classifying tweets as carrying either inaccurate or accurate health information. Future studies should consider applying ensemble learning to combine the best models as it may produce better results.

SELECTION OF CITATIONS
SEARCH DETAIL